Adaptive Cross-Modal Embeddings for Image-Text Alignment
نویسندگان
چکیده
منابع مشابه
Cross-modal Retrieval by Text and Image Feature Biclustering
We describe our approach to the ImageCLEF-Photo 2007 task. The novelty of our method consists of biclustering image segments and annotation words. Given the query words, we may select the image segment clusters that have strongest cooccurrence with the corresponding word clusters. These image segment clusters act as the selected segments relevant to a query. We rank text hits by our own tf.idf ...
متن کاملCross-modal domain adaptation for text-based regularization of image semantics in image retrieval systems
In query-by-semantic-example image retrieval, images are ranked by similarity of semantic descriptors. These descriptors are obtained by classifying each image with respect to a pre-defined vocabulary of semantic concepts. In this work, we consider the problem of improving the accuracy of semantic descriptors through cross-modal regularization, based on auxiliary text. A cross-modal regularizer...
متن کاملAdaptive Color-Image Embeddings for Database Navigation
Proceedings of the 1998 IEEE Asian Conference on Computer Vision, Hong Kong We present a novel approach to the problem of navigating through a database of color images for the purpose of image retrieval. We endow the database with a metric for the color distributions of the images. We then use multi-dimensional scaling techniques to embed a group of images as points in a two-dimensional Euclide...
متن کاملCross-modal Embeddings for Video and Audio Retrieval
The increasing amount of online videos brings several opportunities for training self-supervised neural networks. The creation of large scale datasets of videos such as the YouTube8M allows us to deal with this large amount of data in manageable way. In this work, we find new ways of exploiting this dataset by taking advantage of the multi-modal information it provides. By means of a neural net...
متن کاملLearning Deep Semantic Embeddings for Cross-Modal Retrieval
Deep learning methods have been actively researched for cross-modal retrieval, with the softmax cross-entropy loss commonly applied for supervised learning. However, the softmax cross-entropy loss is known to result in large intra-class variances, which is not not very suited for cross-modal matching. In this paper, a deep architecture called Deep Semantic Embedding (DSE) is proposed, which is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the AAAI Conference on Artificial Intelligence
سال: 2020
ISSN: 2374-3468,2159-5399
DOI: 10.1609/aaai.v34i07.6915